Cross- vs Within-Company Defect Prediction Studies

نویسندگان

  • Tim Menzies
  • Burak Turhan
  • Ayse Bener
  • Justin Distefano
چکیده

In a recent May 2007 IEEE TSE article, Kitchenham et.al. explored effort estimation and found contradictory evidence about the value of crossvs within-company data. Those contradictory results may have been the result of effort estimation features, some of which are subjective in nature. Static code features are different than effort estimation features. They can be generated in an automatic, rapid, and uniform manner across multiple projects. Therefore, in theory, the conclusions reached from such features may be more uniform. This paper tests that theory by searching for uniform conclusions using crossor within-company static code features. Whereas Kitchenham et.al. explored effort estimation, this paper explores defect prediction. Cross-company static code features will be found to generate higher false alarm rates than within-company data. Hence, cross-company data is best used for mission critical software where (a) the extra costs associated with high false alarm rates is compensated by (b) an associated increase in the probability of predicting fault modules. For other classes of software, false alarm rates can be decreased using a very small amount of local data (often, just 100 modules). In our experiments, the use of within-company data halved the false alarm rate while decreasing prediction rates by only ≈ 10%. Hence, for non-mission-critical software, we strongly recommend using within-company data for defect prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Class Imbalance Learning for Cross-Company Defect Prediction

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviat...

متن کامل

Transfer learning for cross-company software defect prediction

0950-5849/$ see front matter 2011 Elsevier B.V. A doi:10.1016/j.infsof.2011.09.007 ⇑ Corresponding author. Tel.: +86 028 61830557; fa E-mail addresses: [email protected] (Y. Ma), g [email protected] (X. Zeng), [email protected] Context: Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company da...

متن کامل

A Data Filtering Method Based on Agglomerative Clustering

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a crosscompany defect prediction model with high performance. To address such issues, this paper proposes a dat...

متن کامل

A Multi-Source TrAdaBoost Approach for Cross-Company Defect Prediction

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a prediction model with high performance. On the other hand, brute force leveraging of CC data poorly related t...

متن کامل

Negative samples reduction in cross-company software defects prediction

Context: Software defect prediction has been widely studied based on various machine-learning algorithms. Previous studies usually focus on within-company defects prediction (WCDP), but lack of training data in the early stages of software testing limits the efficiency of WCDP in practice. Thus, recent research has largely examined the cross-company defects prediction (CCDP) as an alternative s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007